-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix - divide-by-zero error #161
base: main
Are you sure you want to change the base?
Conversation
Warning Rate limit exceeded@pdimens has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 46 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 Walkthrough📝 Walkthrough📝 WalkthroughWalkthroughThe changes in this pull request involve modifications to multiple scripts within the Changes
Assessment against linked issues
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (7)
harpy/bin/depth_windows.py (3)
26-28
: Good addition of zero check, but needs error logging.The protection against division by zero is good, but we should log when we skip windows to help diagnose issues.
if WINSIZE > 0: depth = _SUM / WINSIZE print(f"{LAST_CONTIG}\t{POSITION}\t{depth}", file = sys.stdout) +else: + print(f"Warning: Skipping window for {LAST_CONTIG} due to zero window size", file=sys.stderr)
Line range hint
7-11
: Add input validation for windowsize.The script should validate that the windowsize argument is positive before processing any data.
parser.add_argument('windowsize', type= int, help = "The window size to use to calcualte mean depth over (non-overlapping)") args = parser.parse_args() +if args.windowsize <= 0: + print("Error: Window size must be greater than zero", file=sys.stderr) + sys.exit(1) + _SUM = 0
Line range hint
1-44
: Consider adding error handling and logging.The script would benefit from proper error handling and logging to help diagnose issues in production:
- Add try-except blocks around file operations
- Use Python's logging module
- Add progress indicators for long-running operations
Would you like me to provide an example implementation with these improvements?
harpy/bin/assign_mi.py (1)
86-89
: Consider defining the default BX value as a constant.The implementation correctly removes MI and DI tags while adding a default BX tag. However, the hardcoded value "A00C00B00D00" could be defined as a constant at the module level for better maintainability.
Consider adding this at the top of the file:
+# Default BX tag value for records with missing BX tags +DEFAULT_BX_VALUE = "A00C00B00D00"Then update the
write_missingbx
function:- tags.append(("BX", "A00C00B00D00")) + tags.append(("BX", DEFAULT_BX_VALUE))harpy/bin/deconvolve_alignments.py (3)
25-25
: Avoid using 'input' as an argument name to prevent shadowingUsing
'input'
as the argument name shadows the built-ininput()
function in Python, which might lead to unexpected behavior.Consider renaming the argument to
'input_file'
.Apply this diff:
-parser.add_argument('input', help = "Input coordinate-sorted bam/sam file") +parser.add_argument('input_file', help = "Input coordinate-sorted bam/sam file") ... -bam_input = args.input +bam_input = args.input_file
52-71
: Resolve contradiction regarding MI tag handling inwrite_invalidbx
The docstring states that the function keeps the existing MI tag if present, but the comment contradicts this by stating it will not keep an existing MI tag. Please ensure the documentation and code behavior are consistent.
Update the docstring or adjust the code to accurately reflect the function's behavior.
101-102
: Simplify nestedif
statements by combining themTo improve code readability, you can combine the nested
if
statements into a single condition usingand
.Apply this diff to simplify the code:
-if bam_input.lower().endswith(".bam"): - if not os.path.exists(bam_input + ".bai"): +if bam_input.lower().endswith(".bam") and not os.path.exists(bam_input + ".bai"):🧰 Tools
🪛 Ruff
101-102: Use a single
if
statement instead of nestedif
statementsCombine
if
statements usingand
(SIM102)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
harpy/_cli_types_generic.py
(2 hunks)harpy/bin/assign_mi.py
(2 hunks)harpy/bin/deconvolve_alignments.py
(1 hunks)harpy/bin/depth_windows.py
(2 hunks)
🧰 Additional context used
🪛 Ruff
harpy/bin/deconvolve_alignments.py
101-102: Use a single if
statement instead of nested if
statements
Combine if
statements using and
(SIM102)
129-129: Do not use bare except
(E722)
🔇 Additional comments (6)
harpy/bin/assign_mi.py (2)
Line range hint 65-89
: Verify if these changes resolve the ZeroDivisionError.
While the changes improve tag handling consistency by removing MI tags from invalid/missing BX records, we should verify if this fully addresses the ZeroDivisionError
mentioned in issue #160. Could you provide more context about how these changes prevent the divide-by-zero condition in the molecule_coverage
rule?
Let's check the molecule_coverage rule implementation:
#!/bin/bash
# Search for the molecule_coverage rule and its dependencies
rg -l "molecule_coverage" | while read -r file; do
echo "=== $file ==="
rg "rule molecule_coverage" "$file" -A 10
done
# Search for potential division operations in related files
rg "ZeroDivisionError|/ 0|\b/\s*0\b"
65-67
: Verify the impact of MI tag removal.
The change to remove existing MI tags from invalid BX records is well-documented and implemented correctly. However, since this changes the previous behavior, we should verify that downstream processes handle records without MI tags appropriately.
Let's check for any downstream dependencies on MI tags:
✅ Verification successful
Based on the comprehensive code analysis, I can now provide a final response:
Removing MI tags from invalid BX records is safe and correct
The change to remove existing MI tags from invalid BX records is appropriate because:
- The
molecule_coverage.py
script doesn't use MI tags at all - it works with BX stats data - The only downstream usage of MI tags is in
concatenate_bam.py
, which properly handles records with missing MI tags - Invalid BX records shouldn't have MI tags as they don't represent valid molecules, which is consistent with the tool's documentation
The change aligns with the codebase's design where MI tags are only assigned to valid molecular identifiers, making the behavior more consistent and preventing potential issues with invalid molecular assignments.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for code that relies on MI tags
rg -l "MI" | grep -v "assign_mi.py" | while read -r file; do
echo "=== $file ==="
rg "get_tag\(['\"]MI['\"]" "$file" -A 2
done
Length of output: 1791
Script:
#!/bin/bash
# Let's check for MI tag usage in molecule_coverage.py since it's related to the divide-by-zero error
echo "=== molecule_coverage.py content ==="
cat harpy/bin/molecule_coverage.py
# Also check for any other files that might process MI tags
echo -e "\n=== Files referencing MI tags ==="
rg -l "MI" --type py
# Check specific MI tag processing
echo -e "\n=== MI tag processing in Python files ==="
rg "MI" --type py -B2 -A2
Length of output: 19671
harpy/_cli_types_generic.py (3)
67-67
: LGTM! Clear error handling for unsupported file types.
The validation check and error message effectively guide users when encountering unsupported file types.
116-116
: Whitespace change.
Line range hint 67-116
: Verify the relevance to the divide-by-zero fix.
These changes appear unrelated to the main PR objective of fixing the ZeroDivisionError
in the molecule_coverage
rule. Let's verify if this file is in the relevant code path.
harpy/bin/deconvolve_alignments.py (1)
168-169
: Ensure correct handling of negative distances in dist
calculation
The calculation dist = pos_start - d[bx]["lastpos"]
may result in negative distances if pos_start
is less than d[bx]["lastpos"]
. Ensure that the logic accounts for this scenario and handles negative distances appropriately.
fixes #160
Summary by CodeRabbit
Release Notes
New Features
Improvements
Documentation